Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

نویسندگان

چکیده

We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time space. show that PE is equivalent maintaining martingale condition of process. From this perspective, we find mean--square TD error approximates quadratic variation thus not suitable objective PE. present two use characterization designing algorithms. The first one minimizes ``martingale loss function, whose solution proved be best approximation true value function sense. This method interprets classical gradient Monte-Carlo algorithm. second based on system equations called orthogonality conditions with ``test functions''. Solving these different ways recovers various algorithms, such as TD($\lambda$), LSTD, GTD. Different choices test functions determine what sense resulting solutions approximate function. Moreover, prove any convergent time-discretized algorithm converges its continuous-time counterpart mesh size goes zero. demonstrate theoretical results corresponding algorithms numerical experiments applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Difference Learning in Continuous Time and Space

A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobiological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a p...

متن کامل

Temporal Diierence Learning in Continuous Time and Space

A continuous-time, continuous-state version of the temporal diier-ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi-ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a ...

متن کامل

eplicitation in interlingual and intralingual translations of shahnameh ferdowsi: a text linguistic approach

بررسی و مقایسه تفاوتها و شباهت های ترجمه ی درون زبانی و برون زبانی با تمرکز بر زبانشناسی متن. برای امر مقایسه میزان بسامد تصریح به کار رفته در ترجمه ی درون زبانی و نیز برون زبانی شاهنامه ی فردوسی مورد بررسی قرار گرفت.

the relationship between language and social capital in ilami kurdish: a sociopragmatic approach

چکیده زبان به عنوان یک وسیله در ایجاد و بازسازی سرمایه اجتماعی در چند دهه گذشته مورد توجه بوده است. اگر چه درباره سرمایه اجتماعی و سازه های مربوط به آن زیاد نوشته شده است ولی خیلی کم بر روی اینکه چطور زبان می تواند باعث ایجاد اعتماد یا بی اعتمادی بشود مطالعه ای انجام شده است. این مطالعه به منظور تحقق دو هدف انجام گرفته است. اول تلاش خواهد شد تا یک گونه شناسی از واژگانی که مردم کرد زبان شهر ا...

15 صفحه اول

a frame semantic approach to the study of translating cultural scripts in salingers franny and zooey

the frame semantic theory is a nascent approach in the area of translation studies which goes beyond the linguistic barriers and helps us to incorporate cognitive and cultural factors to the study of translation. based on rojos analytical model (2002b), which centered in the frames or knowledge structures activated in the text, the present research explores the various translation problems that...

15 صفحه اول

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Social Science Research Network

سال: 2021

ISSN: ['1556-5068']

DOI: https://doi.org/10.2139/ssrn.3905379